TODO

REINCLUDE SELF ESTEEM VAHEY STUDY? check clinical exclusions

# dependencies
library(tidyverse)
library(pwr)
library(irr)
library(metafor)
library(forestploter)
library(janitor)
library(greekLetters)
library(knitr)
library(kableExtra)

dir.create("models")
dir.create("plots")

# function to round all numeric vars in a data frame
round_df <- function(df, n_digits = 3) {
  df %>% mutate_if(is.numeric, janitor::round_half_up, digits = n_digits)
}

# apa format p value -----
apa_p_value <- function(p){
  p_formatted <- ifelse(p >= 0.001, paste("=", round_half_up(p, 3)),
                        ifelse(p < 0.001, "< .001", NA))
  p_formatted <- gsub(pattern = "0.", replacement = ".", x = p_formatted, fixed = TRUE)
  p_formatted
}

# meta results string
meta_effect_string <- function(fit, predictions){
  paste0("k = ", fit$k, ", r = ", predictions$pred,
         ", 95% CI [", predictions$ci.lb, ", ", predictions$ci.ub, "]",
         ", 95% CR [", predictions$pi.lb, ", ", predictions$pi.ub, "]",
         ", p = ", signif(2*pnorm(-abs(fit$zval)), digits = 1))  # exact p value from z score
}

# heterogeneity strings
# heterogeneity_string <- function(fit){
#   #"Q(df=", fit$k - 1, ") = ", janitor::round_half_up(fit$QE, 2),
#   #        ", p ", ifelse(fit$QEp < 0.001, "< .001", paste0("= ", as.character(janitor::round_half_up(fit$QEp, 3)))),
#   #        ", ", 
#   if(!is.null(fit$I2)){
#     heterogeneity_string <- 
#       paste0("Heterogeneity: ", greekLetters::greeks("tau^2"), " = ", janitor::round_half_up(fit$tau2, 2),
#              ", ", greekLetters::greeks("I^2"), " = ", janitor::round_half_up(fit$I2, 2),
#              "%, ", greekLetters::greeks("H^2"), " = ",janitor::round_half_up(fit$H2, 2))
#   } else {
#     heterogeneity_string <- 
#       paste0("Heterogeneity: ", greekLetters::greeks("tau^2"), " = ", janitor::round_half_up(fit$tau2, 2))
#   }
#   return(heterogeneity_string)
# }
# 
# footnote   = heterogeneity_string(fit_new)

# notation off
options(scipen = 999)

# plot theme
custom_theme <- 
  forest_theme(base_size    = 10,
               ci_lty       = 1,
               ci_lwd       = 1.5,
               ci_Theight   = 0.2,
               summary_fill = "black",
               summary_col  = "black",
               footnote_cex = 0.8)

# get data
data_vahey <- read.csv("../data/data_vahey_et_al_2015.csv")

Computational replication of original analylses

Power analyses

Calculated from meta analytic effect sizes reported in forest plot and text.

Vahey et al.’s reported meta effect size estimate was r = .45, 95% CI [.40, .54], 95% CR [.23, .67]. Using this effect size, they conducted a power analysis.

  • They reported that: “based on the current meta-effect [r = .40], a sample size of 29 participants would provide a study with a statistical power of .80 when examining the statistical significance of first-order Pearson’s r correlations between clinically-focused IRAP effects and corresponding criterion variables” (p.63); and
  • “Adopting a conservative approach in favour of controlling for overly optimistic publication biases, the most recent recommendation is to calculate sample size requirements not in terms of a given meta-effect, but rather in terms of the lower bound of its associated confidence interval (Perugini, Gallucci, & Costantini, 2014). Given that we obtained a confidence interval of (.40, .54) around the present meta-effect, Perugini et al.’s approach implies that a sample size of at least N = 37 would be required in order to achieve a statistical power of .80 when testing a continuous first-order correlation between a clinically-focused IRAP effect and a given criterion variable” (p.63).

I used the R package pwr to attempt to reproduce these values.

  • Minimum sample size using r = .45: 29 participants to have 80% power, alpha = .05, one-tailed.
  • Minimum sample size using lower CI r = .40: 37 participants to have 80% power, alpha = .05, one-tailed.
  • The results of Vahey et al.’s power analyses could therefore be computationally reproduced if one uses the more liberal one-tailed test, which was not specified by Vahey et al. 

However, Vahey et al.’s analytic choices here could be questioned: One-tailed correlations tests with alpha = .05 are very uncommon when reporting the significance of correlations, and regression analyses require two-sided testing. A two-tailed test with alpha = .05 would more correspond far more closely to modal research practices. I therefore recomputed sample size estimates using these parameters:

  • Minimum sample size using r = .45: 36 participants to have 80% power, alpha = .05, two-tailed.
  • Minimum sample size using lower CI r = .40: 46 participants to have 80% power, alpha = .05, two-tailed.
  • This suggested sample sizes using more common assumptions are 24% higher than Vahey et al.’s reported estimates.

Meta-analytic effect size and intervals

Calculated from weighted average effect sizes reported in forest plot.

The power analyses relied on the accuracy of the meta-analytic effect size. So, I then attempted to computationally reproduce the meta-analytic effect size from the effect sizes, confidence intervals, credibility intervals, and sample sizes reported in Vahey et al.’s forest plot and their description of their meta-analytic method in their manuscript.

In an email, Vahey stated that they performed a Hunter & Schmidt meta analysis, following the guide of Field & Gillett (2010) and using their code.

On inspection, Field & Gillett (2010) make a distinction between the Hunter & Schmidt method, which is distinctive for reporting credibility intervals rather than confidence intervals, and the Hedges and colleagues’ method, which is distinctive for using Fisher’s r-to-z transformations.

Vahey et al. (2015) do not report applying any transformations to their data. However, Vahey et al.’s (2015) individual effect sizes in their forest plot have asymmetric confidence intervals. This implies that a transformation was performed. It is therefore possible that Vahey et al. combined the two methods (and code) provided by Field & Gillett (2010). I first fit a Hunter & Schmidt method, then a combined Hunter & Schmidt and Hedges’ and colleagues method.

Hunter & Schmidt method using Field & Gillett’s (2010) SPSS script

See “computational replication of meta-analysis via SPSS script” folder.

Summary of results:

results_spss <- tibble(
  k_studies = c(15, 15),
  total_n = c(494, 494),
  pred = c(.45, .4670), 
  ci.lb = c(.40, .1955),
  ci.ub = c(.54, .7392),
  pi.lb = c(.23, .4674),
  pi.ub = c(.67, .4674)
)%>%
  round_df(2) 

results_spss %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
k_studies total_n pred ci.lb ci.ub pi.lb pi.ub
15 494 0.45 0.4 0.54 0.23 0.67
15 494 0.47 0.2 0.74 0.47 0.47
  • Meta estimate is not the same as Vahey et al.’s forest plot.
  • Meta confidence intervals are not same as Vahey et al.’s stated in text.
  • Meta credibility/prediction intervals are not the same as Vahey et al.’s credibility intervals in their forest plot.

Hunter & Schmidt method using metafor R package

Information about implementation of Hunter & Schmidt-style meta-analysis in metafor from here.

total_n <- data_vahey %>%
  distinct(article, n_forest_plot) %>%
  summarize(total_n = sum(n_forest_plot, na.rm = TRUE)) %>%
  pull(total_n)

# calculate variances
data_recalculated_variance <- 
  escalc(measure = "COR", 
         ri      = mean_r_forest_plot_numeric, 
         ni      = n_forest_plot, 
         data    = data_vahey, 
         vtype   = "AV") %>%
  drop_na() %>%
  select(article, article_abbreviated,
         yi, vi, ni = n_forest_plot) %>%
  mutate(lower = yi - sqrt(vi)*1.96,
         upper = yi + sqrt(vi)*1.96)

# fit model
fit_reproduced <- 
  rma(yi      = yi, 
      vi      = vi, 
      weights = ni, # Hunter Schmidt method requires weighting by N
      method  = "HS", # Hunter Schmidt method
      data    = data_recalculated_variance,
      slab    = article_abbreviated)

predictions_reproduced <- 
  predict(fit_reproduced) %>%
  as.data.frame() %>%
  select(-se) 

predictions_reproduced_printing <- predictions_reproduced %>%
  round_df(2)

# predictions_reproduced %>%
#   round_df(2) %>%
#   kable() %>%
#   kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)

# plot
data_forest <- data_recalculated_variance %>%
  drop_na() %>%
  select(article_abbreviated,
         n = ni,
         r = yi,
         lower,
         upper) %>%
  # bind_rows(tibble(article_abbreviated = "Meta",
  #                  n = 35, # arbitrary number to make the diamond an appropriate size
  #                  r = predictions_reproduced$pred,
  #                  lower = predictions_reproduced$pi.lb,
  #                  upper = predictions_reproduced$pi.ub)) %>%
  bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
                   n = 35, # arbitrary number to make the diamond an appropriate size
                   r = c(predictions_reproduced$pred, predictions_reproduced$pred),
                   lower = c(predictions_reproduced$ci.lb, predictions_reproduced$pi.lb),
                   upper = c(predictions_reproduced$ci.ub, predictions_reproduced$pi.ub))) %>%
  mutate(size = n/sum(n),
         n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
         article_abbreviated = fct_relevel(article_abbreviated,
                                           "Carpenter et al. (2012)", 
                                           "Dawson et al. (2009)", 
                                           "Hooper et al, (2010)", 
                                           "Hussey & Barnes-Holmes (2012)", 
                                           "Kishita et al. (2014)", 
                                           "Kosnes et al. (2013)", 
                                           "Nicholson & Barnes-Holmes (2012a)", 
                                           "Nicholson & Barnes-Holmes (2012b)", 
                                           "Nicholson, Dempsey et al. (2013)", 
                                           "Nicholson, McCourt et al. (2013)", 
                                           "Parling et al. (2011)", 
                                           "Remue et al. (2013)", 
                                           "Timko et al. (2010, Study 1)", 
                                           "Vahey et al. (2009)", 
                                           "Vahey et al. (2010)",
                                           "Meta")) %>%
  mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
  select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
  round_df(2)

p_reproduced <- 
  forestploter::forest(data       = select(data_forest, -size),
                       est        = data_forest$r,
                       lower      = data_forest$lower, 
                       upper      = data_forest$upper,
                       sizes      = 1/data_forest$size,
                       #is_summary = c(rep(FALSE, nrow(data_forest)-1), TRUE),
                       is_summary = c(rep(FALSE, nrow(data_forest)-2), TRUE, TRUE),
                       ci_column  = 2,
                       ref_line   = 0,
                       theme      = custom_theme,
                       xlim       = c(-0.3, 1.1),
                       ticks_at   = c(-.2, 0, .2, .4, .6, .8, 1.0))

p_reproduced

ggplot2::ggsave(filename = "plots/forest plot Hunter & Schmidt method.pdf", 
                plot = p_reproduced,
                device = "pdf",
                width = 7.5, 
                height = 4.5, 
                units = "in")

Meta effect: k = 15, r = 0.47, 95% CI [0.4, 0.54], 95% CR [0.4, 0.54], p = 0.000000000000000000000000000000000000004.

  • Individual estimates are the same as Vahey et al.’s forest plot.
  • Individual estimates’ confidence intervals are not the same as Vahey et al.’s forest plot.
  • Meta estimate is not the same as Vahey et al.’s forest plot.
  • Meta confidence intervals are same as Vahey et al.’s stated in text.
  • Meta credibility/prediction intervals are not the same as Vahey et al.’s credibility intervals in their forest plot.

Hunter & Schmidt method with Hedges’ and colleagues transformations using metafor R package

Fisher’s r-to-z transformation and back transformation, with HS estimator. No Overton (1998) transformation.

# calculate variances
data_recalculated_variance_transformed <- 
  escalc(measure = "ZCOR", 
         ri      = mean_r_forest_plot_numeric, 
         ni      = n_forest_plot, 
         data    = data_vahey, 
         vtype   = "AV") %>%
  drop_na() %>%
  select(article, article_abbreviated,
         yi, vi, ni = n_forest_plot) %>%
  mutate(lower = yi - sqrt(vi)*1.96,
         upper = yi + sqrt(vi)*1.96)

# fit model
fit_reproduced_transformed <- 
  rma(yi      = yi, 
      vi      = vi, 
      weights = ni, # Hunter Schmidt method requires weighting by N
      method  = "HS", # Hunter Schmidt method
      data    = data_recalculated_variance_transformed,
      slab    = article_abbreviated)

predictions_reproduced_transformed <- 
  predict(fit_reproduced_transformed) %>%
  as.data.frame() %>%
  select(-se)

predictions_reproduced_transformed_backtransformed <- predictions_reproduced_transformed %>%
  mutate_all(transf.ztor) %>%
  round_df(2)

# plot
data_forest_transformed <- data_recalculated_variance_transformed %>%
  select(article_abbreviated,
         n = ni,
         r = yi,
         lower,
         upper) %>%
  # bind_rows(tibble(article_abbreviated = "Meta",
  #                  n = 35, # arbitrary number to make the diamond an appropriate size
  #                  r = predictions_reproduced_transformed$pred,
  #                  lower = predictions_reproduced_transformed$pi.lb,
  #                  upper = predictions_reproduced_transformed$pi.ub)) %>%
  bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
                   n = 35, # arbitrary number to make the diamond an appropriate size
                   r = c(predictions_reproduced_transformed$pred, predictions_reproduced_transformed$pred),
                   lower = c(predictions_reproduced_transformed$ci.lb, predictions_reproduced_transformed$pi.lb),
                   upper = c(predictions_reproduced_transformed$ci.ub, predictions_reproduced_transformed$pi.ub))) %>%
  mutate(r = transf.ztor(r),
         lower = transf.ztor(lower),
         upper = transf.ztor(upper)) %>%
  mutate(size = n/sum(n),
         n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
         article_abbreviated = fct_relevel(article_abbreviated,
                                           "Carpenter et al. (2012)", 
                                           "Dawson et al. (2009)", 
                                           "Hooper et al, (2010)", 
                                           "Hussey & Barnes-Holmes (2012)", 
                                           "Kishita et al. (2014)", 
                                           "Kosnes et al. (2013)", 
                                           "Nicholson & Barnes-Holmes (2012a)", 
                                           "Nicholson & Barnes-Holmes (2012b)", 
                                           "Nicholson, Dempsey et al. (2013)", 
                                           "Nicholson, McCourt et al. (2013)", 
                                           "Parling et al. (2011)", 
                                           "Remue et al. (2013)", 
                                           "Timko et al. (2010, Study 1)", 
                                           "Vahey et al. (2009)", 
                                           "Vahey et al. (2010)",
                                           "Meta")) %>%
  mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
  select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
  round_df(2)

p_reproduced_transformed <- 
  forestploter::forest(data       = select(data_forest_transformed, -size),
                       est        = data_forest_transformed$r,
                       lower      = data_forest_transformed$lower, 
                       upper      = data_forest_transformed$upper,
                       sizes      = 1/data_forest_transformed$size,
                       #is_summary = c(rep(FALSE, nrow(data_forest)-1), TRUE),
                       is_summary = c(rep(FALSE, nrow(data_forest)-2), TRUE, TRUE),
                       ci_column  = 2,
                       ref_line   = 0,
                       theme      = custom_theme,
                       xlim       = c(-0.3, 1.1),
                       ticks_at   = c(-.2, 0, .2, .4, .6, .8, 1.0))

p_reproduced_transformed

ggplot2::ggsave(filename = "plots/forest plot Hunter & Schmidt method with Fisher's r-to-z transformations.pdf", 
                plot = p_reproduced_transformed,
                device = "pdf",
                width = 7.5, 
                height = 4.5, 
                units = "in")

Meta effect: k = 15, r = 0.47, 95% CI [0.4, 0.54], 95% CR [0.4, 0.54], p = 0.000000000000000000000000002.

  • Individual estimates are the same as Vahey et al.’s forest plot.
  • Individual estimates’ confidence intervals are the same as Vahey et al.’s forest plot.
  • Meta estimate is not the same as Vahey et al.’s forest plot.
  • Meta confidence intervals are same as Vahey et al.’s stated in text.
  • Meta credibility/prediction intervals are not the same as Vahey et al.’s credibility intervals in their forest plot.

Summary

bind_rows(
  results_spss %>% select(-k_studies, -total_n),
  predictions_reproduced_printing,
  predictions_reproduced_transformed_backtransformed
) %>%
  rownames_to_column(var = "analysis") %>%
  mutate(analysis = case_when(analysis == "...1" ~ "Vahey et al.",
                              analysis == "...2" ~ "Attempted computational reproduction: Field & Gillett's implementation of Hunter & Schmidt",
                              analysis == "...3" ~ "Attempted computational reproduction: Vichtbauer's implementation of Hunter & Schmidt",
                              analysis == "...4" ~ "Attempted computational reproduction: Vichtbauer's implementation of Hunter & Schmidt + Fisher's r-to-z transformations")) %>%
  rename(estimate = pred, ci_lower = ci.lb, ci_upper = ci.ub, cr_lower = pi.lb, cr_upper = pi.ub) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
analysis estimate ci_lower ci_upper cr_lower cr_upper
Vahey et al.  0.45 0.4 0.54 0.23 0.67
Attempted computational reproduction: Field & Gillett’s implementation of Hunter & Schmidt 0.47 0.2 0.74 0.47 0.47
Attempted computational reproduction: Vichtbauer’s implementation of Hunter & Schmidt 0.47 0.4 0.54 0.40 0.54
Attempted computational reproduction: Vichtbauer’s implementation of Hunter & Schmidt + Fisher’s r-to-z transformations 0.47 0.4 0.54 0.40 0.54

I report three attempts to reproduce Vahey et al.’s results using (a) Field & Gillett’s (2010) SPSS script for a Hunter & Schmidt style meta-analysis, which Vahey referred me to use to reproduce their results, (b) a close implementation of the same Hunter & Schmidt style meta-analysis in R using the well-established metafor package, and (c) the same Hunter& Schmidt meta-analysis in r/metafor adding the transformations Field & Gillett recommended for a different analytic strategy (i.e., Hedges’ and colleagues use of Fisher’s r-to-z transformations).

None of the three analytic strategies/implementations fully reproduced Vahey et al.’s results.

All three return the same meta-estimate of effect size: r = .47, where Vahey et al reported r = .45.

Both of the metafor implementations produce the same confidence intervals as reported in Vahey et al, but Field & Gillett’s SPSS script produces very different ones.

Only the analysis that included Fishers’ r-to-z transformations can account for and replicate the asymmetric confidence intervals on individual effect sizes that Vahey et al. reported in their forest plot. At minimum, it appears that Vahey et al. employed (and did not report using) a mixture of the Hunter & Schmidt method and the Hedges’ method of transformation.

No implementation produced similar credibility intervals to Vahey et al.

Average effect sizes

Reported in forest plot, calculated from individual effect sizes in supplementary materials.

Next, I attempted to reproduce the weighted-mean effect sizes reported in Vahey et al.’s forest plot from the individual effect sizes they reported in their supplementary materials.

Vahey et al. reported that they weighted by degrees of freedom, and I do the same.

I noted that some of the confidence intervals in Vahey et al.’s forest plot are asymmetrical around the point estimate. This is uncommon and not accounted for by Vahey et al. detailing of how they calculated the effect sizes and their confidence intervals. However, I take them at face value as they are the most detailed data available to work from.

# recalculate and compare
data_weighted_effect_sizes <- data_vahey %>%
  select(article = article_abbreviated, 
         r_supplementary, 
         df_supplementary) %>%
  group_by(article) %>%
  mutate(mean_df = mean(df_supplementary)) %>%
  ungroup() %>%
  mutate(r_weighted_by_df = r_supplementary*(df_supplementary/mean_df)) %>%
  group_by(article) %>%
  summarize(r_recalculated_weighted_mean = round_half_up(mean(r_weighted_by_df), 2)) %>%
  left_join(data_vahey %>% 
              select(article = article_abbreviated, 
                     mean_r_forest_plot_numeric) %>%
              drop_na(),
            by = "article") %>%
  mutate(congruence = ifelse(janitor::round_half_up(r_recalculated_weighted_mean, 2) == 
                               janitor::round_half_up(mean_r_forest_plot_numeric, 2), "Congruent", "Incongruent"),
         congruence = fct_relevel(congruence, "Congruent", "Incongruent"),
         difference = r_recalculated_weighted_mean - mean_r_forest_plot_numeric)

# plot
p_comparison_reaveraged <- 
  ggplot(data_weighted_effect_sizes, aes(mean_r_forest_plot_numeric, r_recalculated_weighted_mean)) +
  geom_abline(slope = 1, linetype = "dotted") +
  geom_point(aes(color = congruence, shape = congruence), size = 2.25) +
  theme_classic() +
  scale_color_viridis_d(begin = 0.25, end = 0.75) + 
  theme(legend.position = "top", # c(0.25, 0.85),
        legend.title = element_blank()) + 
        #legend.box.background = element_rect(colour = "black")) +
  xlim(0.35, 0.75) +
  ylim(0.35, 0.75) +
  xlab("Reported in Vahey et al.'s (2015)\nforest plot") +
  ylab("Recalculated from Vahey et al.'s (2015)\nsupplementary materials") +
  annotate("text", x = .66, y = .45, size = 3, label = "Overestimates validity") +
  annotate("text", x = .45, y = .66, size = 3, label = "Underestimates valdiity")

p_comparison_reaveraged

ggplot2::ggsave(filename = "plots/scatter plot comparing effect sizes in forest plot to reaveraged from supplementary materials.pdf", 
                plot = p_comparison_reaveraged,
                device = "pdf",
                width = 4.5, 
                height = 4.5, 
                units = "in")

 # table
data_weighted_effect_sizes %>%
  summarize(n_incongruent = sum(ifelse(congruence == "Incongruent", TRUE, FALSE)),
            percent_incongruent = round_half_up(mean(ifelse(congruence == "Incongruent", TRUE, FALSE)), 2)*100) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
n_incongruent percent_incongruent
2 13
data_weighted_effect_sizes %>%
  filter(difference != 0) %>%
  select(article, r_recalculated_weighted_mean, mean_r_forest_plot_numeric, difference, congruence) %>%
  round_df(2) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
article r_recalculated_weighted_mean mean_r_forest_plot_numeric difference congruence
Nicholson, Dempsey et al. (2013) 0.46 0.48 -0.02 Incongruent
Vahey et al. (2009) 0.58 0.53 0.05 Incongruent

The weighted effect sizes reported in Vahey et al.’s forest plot could not be computationally reproduced from the individual effect sizes and degrees of freedom reported in their supplementary materials.

Many values were congruent, but a minority differed (k = 2 [13%]). Table includes those that differed.

Extraction and conversion of included individual effect sizes

Attempted to reextract the individual effect sizes listed in the supplementary materials and convert them to Pearson’s r.

Note that only a subset were possible to reextract/convert, for the following reasons:

  • Vahey et al. appear to have treated the \(\eta^2_p\) effect size reported in original papers as if it was equivalent to \(\eta^2\), which it is not: (a) \(\eta^2\) has a relatively simpler mathematical transformation to Pearson’s r, which Vahey et al. appear to have applied to \(\eta^2_p\), which is not possible to convert to Pearson’s r due to the fact that it is a partial correlation (e.g., Hooper et al., 2010; Hussey & Barnes-Holmes, 2012).
  • In some cases, effect sizes reported in Vahey et al.’s supplementary materials did not refer to effect sizes that were reported in the original article (e.g., Timko et al., 2010; Study 1, correlation between overall D score and DASS-total). It is possible that these effects were obtained by contacting the original authors, but this is not reported in Vahey et al.
  • Some were not reported in sufficient detail in the original paper to allow for the calculation of an effect size, and contact with the original authors did not result in data/result being provided (e.g., Parling et al., 2012).
  • Non-criterion effects that referred only to the presence of an IRAP effect, but not the covariation between IRAP scores and an external variable, were not reextracted or converted. This was on the basis that this work is labor intensive, these effects do not meet the inclusion criteria, and therefore these effects would not be used in the subsequent new meta analysis. This point is discussed further below.
data_reextracted_and_annotated <- read.csv("../data/data_reextracted_and_annotated.csv")

# get effect sizes
data_individual_effect_sizes <- data_reextracted_and_annotated %>%
  # consider only those effect sizes employed in vahey et al.'s meta-analysis
  filter(es_included_in_vahey_meta == TRUE) %>%
  # create an simplified effect size identifier
  group_by(article) %>%
  mutate(es_number = row_number()) %>%
  ungroup() %>%
  select(article, es_number, r_vahey, r_from_paper, n_from_paper) 

# table
data_individual_effect_sizes %>%
  summarize(n_reextracted = sum(!is.na(r_from_paper)),
            percent_reextracted = round_half_up(sum(!is.na(r_from_paper))/n(), 3)*100) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
n_reextracted percent_reextracted
29 51.8
# assess congruence
data_individual_effect_sizes_congruence <- data_individual_effect_sizes %>%
  na.omit() %>%
  mutate(congruence = ifelse(round_half_up(r_from_paper, 2) == round_half_up(r_vahey, 2), "Congruent", "Incongruent"),
         congruence = fct_relevel(congruence, "Congruent", "Incongruent"),
         difference = round_half_up(r_from_paper - r_vahey, 2))

# plot
p_comparison_reextracted <- 
  ggplot(data_individual_effect_sizes_congruence, aes(r_vahey, r_from_paper)) +
  geom_abline(slope = 1, linetype = "dotted") +
  geom_point(aes(color = congruence, shape = congruence), size = 2.25) +
  theme_classic() +
  scale_color_viridis_d(begin = 0.25, end = 0.75) + 
  theme(legend.position = "top", # c(0.25, 0.85),
        legend.title = element_blank()) + 
        #legend.box.background = element_rect(colour = "black")) +
  xlim(0, 0.7) +
  ylim(0, 0.7) +
  xlab("Reported in Vahey et al.'s (2015)\nsupplementary materials") +
  ylab("Reextracted from original articles\n") +
  annotate("text", x = .54, y = .175, size = 3, label = "Overestimates validity") +
  annotate("text", x = .175, y = .54, size = 3, label = "Underestimates valdiity")

p_comparison_reextracted

ggplot2::ggsave(filename = "plots/scatter plot comparing effect sizes in supplementary materials to reextracted effect sizes.pdf", 
                plot = p_comparison_reextracted,
                device = "pdf",
                width = 4.5, 
                height = 4.5, 
                units = "in")

# table
data_individual_effect_sizes_congruence %>%
  summarize(n_incongruent = sum(ifelse(congruence == "Incongruent", TRUE, FALSE)),
            percent_incongruent = round_half_up(mean(ifelse(congruence == "Incongruent", TRUE, FALSE)), 2)*100) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
n_incongruent percent_incongruent
13 45
data_individual_effect_sizes_congruence %>%
  filter(difference != 0) %>%
  select(article, r_vahey, r_from_paper, difference) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
article r_vahey r_from_paper difference
Carpenter et al. (2012) 0.47 0.03 -0.44
Carpenter et al. (2012) 0.23 0.19 -0.04
Carpenter et al. (2012) 0.40 0.18 -0.22
Nicholson, Dempsey et al. (2014) 0.22 0.23 0.01
Nicholson, McCourt et al. (2013) 0.50 0.34 -0.16
Parling et al. (2012) 0.25 0.24 -0.01
Remue, et al. (2013) 0.50 0.37 -0.13
Remue, et al. (2013) 0.37 0.50 0.13
Remue, et al. (2013) 0.58 0.39 -0.19
Remue, et al. (2013) 0.38 0.23 -0.15
Timko et al. (2010; Study 1) 0.45 0.42 -0.03
Vahey et al. (2009) 0.44 0.46 0.02
Vahey et al. (2010) 0.25 0.26 0.01

Table includes those that differed.

Critique of Vahey et al.’s inclusions

Vahey et al (2015) stated: “To be included within the current meta-analysis a given statistical effect must have described the co-variation of an IRAP effect with a corresponding clinically-focused criterion variable. To qualify as clinically-focused, the IRAP and criterion variables must have been deemed to target some aspect of a condition included in a major psychiatric diagnostic scheme such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-5, 2013).”

Vahey et al.’s own inclusion criteria state that they would include, as evidence of criterion validity, effects that describe co-variation between IRAP scores and an external variable. However, many of their included effect sizes involve no external variable, only the presence of a non-zero IRAP effect at the group level. IRAP effects without reference to an external variable that cannot provide criterion validity, just as a group’s mean score on a self-report cannot in isolation provide evidence for that measure’s criterion validity.

Number and proportion of Vahey et al.’s 56 effect sizes from their supplementary materials that cannot provide evidence of criterion validity:

data_reextracted_and_annotated %>%
  filter(es_included_in_vahey_meta == TRUE) %>%
  mutate(non_criterion_validity_effect = specific_criterion_variable_type == "IRAP effect significance from zero test") %>%
  summarize(non_criterion_validity_effect_count = sum(non_criterion_validity_effect),
            non_criterion_validity_effect_proportion = mean(non_criterion_validity_effect)) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
non_criterion_validity_effect_count non_criterion_validity_effect_proportion
21 0.375

Critique of Vahey et al.’s exclusions/omissions

Evidence from Vahey et al.’s choices of what to include vs not is at times hard to defend. Eg vahey et al 2009; comparisons between Nicholson et al. studies.

many effects that would seem to meet the criterion of being clinically relevant, but which were smaller in magnitude

EXPAND WITH EXAMPLES

Comprehensive extraction of effect sizes

Vahey et al. reported extracting 56 effect sizes prior to exclusions: “Collectively, these 15 articles yielded 56 statistical effects between various clinically focused IRAP effects and their respective criterion variables.” (p.60)

data_reextracted <- read.csv("../data/data_reextracted_and_annotated.csv") %>%
  # mark each effect as clinically relevant only if both raters scored it as such
  rowwise() %>%
  mutate(clinically_relevant = as.logical(max(c(clinically_relevant_criterion_rater_1,
                                                clinically_relevant_criterion_rater_2)))) %>%
  ungroup()

total_n <- data_reextracted %>%
  distinct(article, n_from_paper) %>%
  summarize(total_n = sum(n_from_paper)) %>%
  pull(total_n)

I reextracted effect sizes from these same 15 articles and found 322 effect sizes.

Application of exclusion criteria

Vahey et al. reported 100% inter rater reliability for the scoring of the suitability of effect sizes for inclusion: “From the outset, there was no disagreement between the authors about what statistical effects should be excluded from the meta-analysis; and of the 56 statistical effects only 8 (i.e. 14%) were not initially cited by both authors for inclusion.” (p.60)

They do not provide information about the excluded 8 effect sizes.

Vahey et al stated that “The authors decided whether the responses measured by a given IRAP trial-type should co-vary with a specific criterion variable by consulting the relevant empirical literature. In the absence of such evidence, the authors strictly excluded even substantial statistical effects between IRAP effects and accompanying variables from the current meta-analysis.” Note however that Vahey et al. did not provide citations of relevant literature that support the inclusion of any given effect. As such, this high rate of agreement between Vahey & Nicholson does not necessarily imply that the inclusion and exclusion criteria would achieve the same level of agreement by others.

The comprehensively extracted effect sizes were rated by two independent reviewers.

data_raters <- data_reextracted %>%
  select(clinically_relevant_criterion_rater_1, clinically_relevant_criterion_rater_2)

data_raters %>%
  mutate(agreement = ifelse(clinically_relevant_criterion_rater_1 == clinically_relevant_criterion_rater_2, TRUE, FALSE)) %>%
  summarize(interrater_percent_agreement = round_half_up(mean(agreement, na.rm = TRUE), 1)) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
interrater_percent_agreement
0.9
kappa2(data_raters, "unweighted")
##  Cohen's Kappa for 2 Raters (Weights: unweighted)
## 
##  Subjects = 281 
##    Raters = 2 
##     Kappa = 0.879 
## 
##         z = 14.8 
##   p-value = 0

After exclusions, 144 effect sizes remained for inclusion in the new meta-analysis, compared to Vahey et al.’s 56.

Extended analyses

Meta-analysis including all effects meeting Vahey et al.’s own inclusion and exclusion criteria

Excluding non-clinically relevant and effects that do not involve a relationship with an external variable, which cannot provide evidence of criterion validity.

Details of the meta-analytic strategy, which used contemporary methods:

  • Weighting by inverse variance rather than sample size, as sample size provides a poorer proxy of measurement error. This is now more standard practice in meta-analysis.
  • Fishers r-to-z transformations to deal with ceiling effects on confidence intervals.
  • REML estimator rather than Hunter & Schmidt.
  • Multilevel meta-analysis model rather than weighted-averaging of effect sizes (random intercept for source study).
# exclusions and transformations
data_for_meta_new <- data_reextracted %>%
  filter(problematic_analysis == FALSE & clinically_relevant == TRUE) %>%
  mutate(variables = paste(irap_variable, criterion_variable, sep = "-")) %>%
  dplyr::select(article, variables, r_from_paper, n_from_paper, authors_include_bh) %>%
  na.omit() %>%
  escalc(measure = "ZCOR", 
         ri = r_from_paper, 
         ni = n_from_paper,
         data = ., 
         slab = paste(article, variables), 
         digits = 8) %>%
  rename(ni = n_from_paper) %>%
  dplyr::select(article, variables, yi, vi, ni, authors_include_bh) %>%
  na.omit() %>%
  group_by(article) %>%
  mutate(es_number = row_number(),
         article_abbreviated = paste(article, es_number)) %>%
  ungroup() %>%
  mutate(lower = yi - sqrt(vi)*1.96,
         upper = yi + sqrt(vi)*1.96)

# fit 
fit_new <- 
  rma.mv(yi     = yi, 
         V      = vi, 
         random = ~ 1 | article, 
         method = "REML", 
         data   = data_for_meta_new,
         slab   = article_abbreviated)

# save to disk for pdf plots
#write_rds(fit_new, "models/fit_new.rds")

predictions_new <- 
  predict(fit_new) %>%
  as.data.frame() %>%
  select(-se) 

predictions_new_backtransformed <- predictions_new %>%
  mutate_all(transf.ztor) %>%
  round_df(2)

# plot
data_forest_new <- data_for_meta_new %>%
  drop_na() %>%
  select(article_abbreviated,
         n = ni,
         r = yi,
         lower,
         upper) %>%
  # bind_rows(tibble(article_abbreviated = "Meta",
  #                  n = 35, # arbitrary number to make the diamond an appropriate size
  #                  r = predictions_new$pred,
  #                  lower = predictions_new$ci.lb,
  #                  upper = predictions_new$ci.ub)) %>%
  bind_rows(tibble(article_abbreviated = c("Meta (confidence interval)", "Meta (credibility interval)"),
                   n = 35, # arbitrary number to make the diamond an appropriate size
                   r = c(predictions_new$pred, predictions_new$pred),
                   lower = c(predictions_new$ci.lb, predictions_new$pi.lb),
                   upper = c(predictions_new$ci.ub, predictions_new$pi.ub))) %>%
  mutate(r = transf.ztor(r),
         lower = transf.ztor(lower),
         upper = transf.ztor(upper)) %>%
  mutate(size = n/sum(n),
         n = ifelse(str_detect(article_abbreviated, "Meta"), total_n, n),
         article_abbreviated = fct_relevel(article_abbreviated,
                                           "Carpenter et al. (2012) 1", "Carpenter et al. (2012) 10", 
                                           "Carpenter et al. (2012) 11", "Carpenter et al. (2012) 12", "Carpenter et al. (2012) 13", 
                                           "Carpenter et al. (2012) 14", "Carpenter et al. (2012) 15", "Carpenter et al. (2012) 16", 
                                           "Carpenter et al. (2012) 17", "Carpenter et al. (2012) 18", "Carpenter et al. (2012) 19", 
                                           "Carpenter et al. (2012) 2", "Carpenter et al. (2012) 20", "Carpenter et al. (2012) 3", 
                                           "Carpenter et al. (2012) 4", "Carpenter et al. (2012) 5", "Carpenter et al. (2012) 6", 
                                           "Carpenter et al. (2012) 7", "Carpenter et al. (2012) 8", "Carpenter et al. (2012) 9", 
                                           "Dawson et al. (2009) 1", "Dawson et al. (2009) 2", "Dawson et al. (2009) 3", 
                                           "Dawson et al. (2009) 4", "Dawson et al. (2009) 5", "Dawson et al. (2009) 6", 
                                           "Hussey & Barnes-Holmes (2012) 1", "Hussey & Barnes-Holmes (2012) 10", 
                                           "Hussey & Barnes-Holmes (2012) 11", "Hussey & Barnes-Holmes (2012) 12", 
                                           "Hussey & Barnes-Holmes (2012) 13", "Hussey & Barnes-Holmes (2012) 14", 
                                           "Hussey & Barnes-Holmes (2012) 15", "Hussey & Barnes-Holmes (2012) 16", 
                                           "Hussey & Barnes-Holmes (2012) 17", "Hussey & Barnes-Holmes (2012) 18", 
                                           "Hussey & Barnes-Holmes (2012) 19", "Hussey & Barnes-Holmes (2012) 2", 
                                           "Hussey & Barnes-Holmes (2012) 20", "Hussey & Barnes-Holmes (2012) 21", 
                                           "Hussey & Barnes-Holmes (2012) 22", "Hussey & Barnes-Holmes (2012) 23", 
                                           "Hussey & Barnes-Holmes (2012) 24", "Hussey & Barnes-Holmes (2012) 25", 
                                           "Hussey & Barnes-Holmes (2012) 26", "Hussey & Barnes-Holmes (2012) 27", 
                                           "Hussey & Barnes-Holmes (2012) 28", "Hussey & Barnes-Holmes (2012) 29", 
                                           "Hussey & Barnes-Holmes (2012) 3", "Hussey & Barnes-Holmes (2012) 30", 
                                           "Hussey & Barnes-Holmes (2012) 4", "Hussey & Barnes-Holmes (2012) 5", 
                                           "Hussey & Barnes-Holmes (2012) 6", "Hussey & Barnes-Holmes (2012) 7", 
                                           "Hussey & Barnes-Holmes (2012) 8", "Hussey & Barnes-Holmes (2012) 9", 
                                           "Nicholson & Barnes-Holmes (2012b) 1", "Nicholson & Barnes-Holmes (2012b) 10", 
                                           "Nicholson & Barnes-Holmes (2012b) 2", "Nicholson & Barnes-Holmes (2012b) 3", 
                                           "Nicholson & Barnes-Holmes (2012b) 4", "Nicholson & Barnes-Holmes (2012b) 5", 
                                           "Nicholson & Barnes-Holmes (2012b) 6", "Nicholson & Barnes-Holmes (2012b) 7", 
                                           "Nicholson & Barnes-Holmes (2012b) 8", "Nicholson & Barnes-Holmes (2012b) 9", 
                                           "Nicholson, Dempsey et al. (2014) 1", "Nicholson, Dempsey et al. (2014) 10", 
                                           "Nicholson, Dempsey et al. (2014) 11", "Nicholson, Dempsey et al. (2014) 12", 
                                           "Nicholson, Dempsey et al. (2014) 13", "Nicholson, Dempsey et al. (2014) 14", 
                                           "Nicholson, Dempsey et al. (2014) 15", "Nicholson, Dempsey et al. (2014) 16", 
                                           "Nicholson, Dempsey et al. (2014) 17", "Nicholson, Dempsey et al. (2014) 18", 
                                           "Nicholson, Dempsey et al. (2014) 19", "Nicholson, Dempsey et al. (2014) 2", 
                                           "Nicholson, Dempsey et al. (2014) 20", "Nicholson, Dempsey et al. (2014) 21", 
                                           "Nicholson, Dempsey et al. (2014) 22", "Nicholson, Dempsey et al. (2014) 3", 
                                           "Nicholson, Dempsey et al. (2014) 4", "Nicholson, Dempsey et al. (2014) 5", 
                                           "Nicholson, Dempsey et al. (2014) 6", "Nicholson, Dempsey et al. (2014) 7", 
                                           "Nicholson, Dempsey et al. (2014) 8", "Nicholson, Dempsey et al. (2014) 9", 
                                           "Nicholson, McCourt et al. (2013) 1", "Nicholson, McCourt et al. (2013) 10", 
                                           "Nicholson, McCourt et al. (2013) 2", "Nicholson, McCourt et al. (2013) 3", 
                                           "Nicholson, McCourt et al. (2013) 4", "Nicholson, McCourt et al. (2013) 5", 
                                           "Nicholson, McCourt et al. (2013) 6", "Nicholson, McCourt et al. (2013) 7", 
                                           "Nicholson, McCourt et al. (2013) 8", "Nicholson, McCourt et al. (2013) 9", 
                                           "Timko et al. (2010; Study 1) 1", "Timko et al. (2010; Study 1) 10", 
                                           "Timko et al. (2010; Study 1) 11", "Timko et al. (2010; Study 1) 12", 
                                           "Timko et al. (2010; Study 1) 13", "Timko et al. (2010; Study 1) 14", 
                                           "Timko et al. (2010; Study 1) 15", "Timko et al. (2010; Study 1) 16", 
                                           "Timko et al. (2010; Study 1) 17", "Timko et al. (2010; Study 1) 18", 
                                           "Timko et al. (2010; Study 1) 19", "Timko et al. (2010; Study 1) 2", 
                                           "Timko et al. (2010; Study 1) 20", "Timko et al. (2010; Study 1) 21", 
                                           "Timko et al. (2010; Study 1) 22", "Timko et al. (2010; Study 1) 23", 
                                           "Timko et al. (2010; Study 1) 24", "Timko et al. (2010; Study 1) 25", 
                                           "Timko et al. (2010; Study 1) 26", "Timko et al. (2010; Study 1) 27", 
                                           "Timko et al. (2010; Study 1) 28", "Timko et al. (2010; Study 1) 29", 
                                           "Timko et al. (2010; Study 1) 3", "Timko et al. (2010; Study 1) 30", 
                                           "Timko et al. (2010; Study 1) 31", "Timko et al. (2010; Study 1) 32", 
                                           "Timko et al. (2010; Study 1) 4", "Timko et al. (2010; Study 1) 5", 
                                           "Timko et al. (2010; Study 1) 6", "Timko et al. (2010; Study 1) 7", 
                                           "Timko et al. (2010; Study 1) 8", "Timko et al. (2010; Study 1) 9", 
                                           "Timko et al. (2010; Study 2) 1", "Timko et al. (2010; Study 2) 10", 
                                           "Timko et al. (2010; Study 2) 11", "Timko et al. (2010; Study 2) 12", 
                                           "Timko et al. (2010; Study 2) 2", "Timko et al. (2010; Study 2) 3", 
                                           "Timko et al. (2010; Study 2) 4", "Timko et al. (2010; Study 2) 5", 
                                           "Timko et al. (2010; Study 2) 6", "Timko et al. (2010; Study 2) 7", 
                                           "Timko et al. (2010; Study 2) 8", "Timko et al. (2010; Study 2) 9",
                                           #"Meta"
                                           "Meta (confidence interval)", 
                                           "Meta (credibility interval)")) %>%
  mutate(` ` = paste(rep(" ", 30), collapse = " ")) %>%
  select(Article = article_abbreviated, ` `, r, lower, upper, n, size) %>%
  mutate(r     = round_half_up(r, 2),
         lower = round_half_up(lower, 2),
         upper = round_half_up(upper, 2))

p_new <- 
  forestploter::forest(data       = select(data_forest_new, -size),
                       est        = data_forest_new$r,
                       lower      = data_forest_new$lower, 
                       upper      = data_forest_new$upper,
                       sizes      = 1/data_forest_new$size,
                       # is_summary = c(rep(FALSE, nrow(data_forest_new)-1), TRUE),
                       is_summary = c(rep(FALSE, nrow(data_forest_new)-2), TRUE, TRUE),
                       ci_column  = 2,
                       ref_line   = 0,
                       theme      = custom_theme,
                       xlim       = c(-0.7, 0.9),
                       ticks_at   = c(-.6, -.4, -.2, 0, .2, .4, .6, .8))

p_new

ggplot2::ggsave(filename = "plots/forest plot new multilevel model.pdf", 
                plot = p_new,
                device = "pdf",
                width = 7.5, 
                height = 32, 
                units = "in")

Meta effect: k = 142, r = 0.18, 95% CI [0.1, 0.26], 95% CR [-0.04, 0.39], p = 0.00001.

Power analyses

Less appropriate one-tailed:

  • Minimum sample size using r = 0.18, 189 participants needed to have 80% power, alpha = .05, one-tailed.
  • Minimum sample size using lower CI r = 0.1, 617 participants needed to have 80% power, alpha = .05, one-tailed.
  • This suggested sample sizes are more than 7 times that recommended by Vahey et al.

More appropriate two-tailed:

  • Minimum sample size using r = 0.18, 240 participants needed to have 80% power, alpha = .05, two-tailed.
  • Minimum sample size using lower CI r = 0.1, 782 participants needed to have 80% power, alpha = .05, two-tailed.
  • This suggested sample sizes are more than 8 times that recommended by Vahey et al.

Conceptual critique: Analyses of the IRAP as the DV rather than the DV

Incompatible with the idea that the IRAP can be used for clinical assessment. This would require that IRAP tells us something about individuals’ clinical status/group/location on a continuum/etc (i.e., IRAP effects -> individuals). Most of the effect sizes come from analyses that assume the opposite, i.e., that individuals’ clinical status/group/location on a continuum/etc. can tell us about their IRAP effects (i.e., individuals -> IRAP effects).

Even this new meta-analysis is mostly informative only for researchers wishing to study IRAP effects, rather than study individuals using the IRAP.

Similar conceptual criticisms have been made elsewhere. Schmall et al (2016) reported results from a very large study which concluded that hippocampal volume ‘robustly discriminate[d] MDD patients from healthy controls’ (p. 1). However, Fried & Kievit (2016) pointed out that this was not the case: depression status predicted a detectable difference in mean hippocampal volume, but hippocampal volume was very weakly associated with individuals’ depression status. They point out that Schmall et al had reversed their analysis’ DV and IV when making conclusions. This error is equally applicable to Vahey et al: even if their meta analysis was computationally replicable, its conclusions would still not support their claim that the meta analysis “demonstrates the potential of the IRAP as a tool for clinical assessment” (p. 64). Rather, it would demonstrate that known clinical status (etc) has potential as a tool for the assessment of IRAP scores.